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ABSTRACT 

The Yin Yang 1 (YY1) transcription factor is a master 
regulator of development, essential for early em- 
bryogenesis and adult tissues formation. YY1 is 
the mammalian orthologue of Pleiohomeotic, one 
of the transcription factors that binds Polycomb 
DNA response elements in Drosophila melanogaster 
and mediates Polycomb group proteins (PcG) re- 
cruitment to DNA. Despite several publications 
pointing at YY1 having a similar role in mammalians, 
others showed features of YY1 that are not compat- 
ible with PcG functions. Here, we show that, in 
mouse Embryonic Stem (ES) cells, YY1 has 
genome-wide PcG-independent activities while it is 
still stably associated with the INO80 chromatin- 
remodeling complex, as well as with novel RNA 
helicase activities. YY1 binds chromatin in close 
proximity of the transcription start site of highly ex- 
pressed genes. Loss of YY1 functions preferentially 
led to a down-regulation of target genes expression, 
as well as to an up-regulation of several small 
non-coding RNAs, suggesting a role for YY1 in 
regulating small RNA biogenesis. Finally, we found 
that YY1 is a novel player of Myc-related transcrip- 
tion factors and that its coordinated binding at pro- 
moters potentiates gene expression, proposing YY1 
as an active component of the Myc transcription 
network that links ES to cancer cells. 

INTRODUCTION 

Yin Yang 1 (YY1) is a DNA binding transcription factor 
discovered 20 years ago as the main binding factor, 
induced by the adenoviral protein El, of the adeno 
associated virus (AAV) promoter region and takes its 
name from the dual activity of the AAV promoter (1). 
YY1 is also the mammalian orthologue of pleiohomeotic 
(pho), one of the DNA binding transcription factors that 
mediate Polycomb Group (PcG) proteins binding at the 



Polycomb Response Elements (PRE) of the Drosophila 
melanogaster genome (2). 

PcG proteins have a key role in early embryogenesis. 
They are master regulators of organism development that 
control cell fate by maintaining repression of their target 
genes, in part through their ability to modify histone 
proteins within the surroundings of their binding sites 
(3). Until now, very few DNA binding factors have been 
described to have the ability to recruit PcG proteins to 
specific chromatin sites and YY1 is one of the best candi- 
dates (3). In fact, similar to PcG proteins, YY1 activity 
results essential for mammalian development, as YYl-null 
embryos die at the peri-implantation stages of embryogen- 
esis (4). YY1 activity is necessary also for adult tissue 
development: for instance, oligodendrocytes-specific de- 
pletion of YY1 causes serious neural defects, mainly 
due to lack of global nerves myelination (5). Moreover, 
reduced YY1 expression in heterozygous knock out (KO) 
mice induces serious growth retardation, proliferative and 
neurological defects (6). Altogether, these data show the 
critical role of YY1 in regulating several developmental 
processes and highlights its similarities with PcG activities. 

Several reports proposed YY1 as a potential recruiting 
factor for Polycomb activities in mammalian cells. For 
example, YY1 was shown to directly interact with the 
Polycomb Repressive Complex 2 (PRC2) subunit Eed in 
Burkitt's lymphoma cells (7), to mediate PcG recruitment 
during myoblasts differentiation (8) and during muscles 
regeneration from satellite stem cells (9). Moreover, YY1 
binding sites were identified into a putative PRE element 
isolated in mammalian cells and it was shown that, when 
the binding sites are mutated, PRE responsiveness is 
affected (10). Finally, recent data on X-chromosome in- 
activation proposed YY1 as a DNA-RNA binding factor 
that links PcG-Xist to the inactive X-chromosome (11). 

Despite these observations, several data pointed at YY1 
having PcG-independent functions. YY1 was shown to 
interact to with the INO80 complex in cancer cell lines 
and proposed to have a positive effect on Cdc6 expression 
(12,13). Similarly, YY1 role in nerve myelination was 
linked to direct YY1 binding at the Egr2 promoter and 
to activation of Egr2 expression in Schwann cells (14). In 
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line with this, it was shown that YY1 interacts with several 
other transcription factors often linked with transcription- 
al activation (6). In addition, YY1 was reported to control 
p53 levels in a DNA independent manner (15) and to bind 
the cruciform structure of Holliday junctions, suggesting a 
role in DNA repair via homologous recombination that is 
consistent with the genomic instability observed in YY1 
deficient fibroblasts (13). Many of these studies are based 
on in vitro or non-physiological observations, often based 
on experiments made on single genes without determining 
a direct YY1 association. Thus, these data do not com- 
pletely clarify Y Y 1 functions and particularly do not fully 
address the real transcriptional nature of YY1. We there- 
fore believe that a detailed analysis of YY1 activity in a 
biologically relevant system is needed to define YY1 func- 
tions at a genome-wide level. Due to YY1 essential role in 
early embryogenesis (4) and the high degree in similarities 
with the phenotypes observed in mutant mice for different 
PcG proteins (4,16-19), we decided to characterize YY1 
functions in mouse embryonic stem (ES) cells. 

ES cells are the tissue culture adaptation of the cells that 
form blastocyst's inner cell mass (20). Mouse ES cells can 
be expanded through an active BMP and STAT3 signaling 
that maintains ES pluripotent state by preserving their po- 
tential to give rise to all cells of an adult organism (20). 
Such signaling stimulates the activity of several transcrip- 
tion factors that are required for their maintenance and 
differentiation (21). The same transcription factors are 
also actively involved in the reprogramming of committed 
cells to a pluripotent ES-like state (22), highlighting how 
the activity of these proteins is essential for ES cell 
identity. Kim and colleagues (23) recently dissected such 
transcriptional network and identified three distinct 
transcription modules: a PcG-module, strictly linked to 
transcriptional repression; a core-module, made of tran- 
scription factors that directly respond to the BMP-STAT 
signaling pathway (Oct4-Nanog-Sox2); and a Myc 
module, made of transcription factors such as c-Myc, 
n-Myc, E2fs and Zfx. In this work, the authors showed 
that the Myc-module, but not the core module, is respon- 
sible for the previously identified ESC-like transcriptional 
features of cancer cells. Moreover, they proposed that 
high activity of the ES Myc-module predicts a poor 
outcome of different kind of human tumors. Consistent 
with this, Myc, like PcG proteins and YY1, is frequently 
over-expressed in cancers and several studies demon- 
strated direct oncogenic effects in mediating normal cell 
transformation and tumor development (6,24,25). 

In the present study, we show at a genome-wide 
level that YY1 exerts PcG-independent functions in 
ES cells. Like in cancer cells, YY1 is associated with all 
components of the INO80 chromatin-remodeling complex, 
as well as to newly identified partners with RNA helicase 
activity. YY1 is preferentially associated with hyper- 
acetylated promoters with high transcriptional activity. 
Loss of YY1 functions in ES cells predominantly dimin- 
ished mRNAs expression while increased the expression 
levels of small non-coding RNAs such as small nuclear, 
nucleolar and micro RNAs. In addition, we identified 
components of the Myc transcription module as potential 
cooperating factors at YY1 sites and demonstrated that 



YY1 binding is prevalently associated with promoters 
co-occupied by other transcription factors such as 
c-Myc, n-Myc, Zfx and E2fl at a genome-wide level. 
Finally, we show that a coordinated occupancy of YY1 
with the Myc transcription module correlates with an 
increased expression of target genes proposing YY1 as a 
partner of the Myc network. 



MATERIALS AND METHODS 

Cell lines generation, manipulation and culturing 

All ES cell lines were grown on 0.1% gelatinized tissue 
culture dishes in DMEM supplemented with 15% Serum 
(Euroclone), Leukemia Inhibitory Factor (produced in 
house), Penicillin-Streptomycin (Gibco), non-essential 
aminoacids (Gibco), Na-Pyruvate (Gibco). BirA express- 
ing ES cell clones were generated from an ES cell line 
described elsewhere (26) by removing the puromycin se- 
lection cassette used for targeting purposes by transient 
CRE recombinase expression. The expression constructs 
for Fbio-Ezh2 and Fbio-YYl were generated by LR re- 
combination of the YY1 and EZH2 coding sequences 
from a pCR8 Gateway entry vector into a pCAG-Flag- 
Avi-ires-Puromycin Gateway compatible destination 
vector using LR recombinase (Invitrogen). Stable cell 
lines were obtained by transient transfection of FBio 
empty, YY1 and Ezh2 expression constructs in BirA-ES 
cells using Lipofectamine 2000 (Invitrogen) and stable se- 
lection with 2ug/ml of puromycin. RNA interference 
(RNAi) experiments were carried out by transfecting 
short interfering RNA (siRNA) oligos specific for YY1 
(Sigma-Aldrich SASI_Mm01 00125709) or with a 
control-scrambled (SCR) sequence (Sigma-Aldrich 
SIC001) using Lipofectamine 2000 at a concentration of 
50nM. Cells were harvested 48 hours post-transfection 
and RNA isolated by TRIzol (Invitrogen) extraction. 
Stable shRNA knock down was obtained with ES cell 
transduction with viral particles produced with the 
LKO.l vectors TRCN0000054556 (shYYl) and SHC202 
(shSCR) purchased from Sigma-Aldrich. 

Antibodies 

Western blot analyses were performed using antibodies 
against: YY1 (Santa Cruz, cat. sc-281); Suzl2 (Santa 
Cruz, Cat. sc-46264), Actr8 (Sigma-Aldrich, Cat.A2107), 
Ddx5 (Abeam, Cat. abl0261), Ddx3x (Millipore, Cat. 
#09-860), Vinculin (Sigma-Aldrich, Cat. V9131), HA 
(Santa Cruz, Cat. sc-805), Oct4 (Abeam, Cat. abl9857), 
P-Tubulin (Santa Cruz, Cat. sc-9104), Biotin (Pierce, Cat. 
31852). Ezh2 and Eed were described elsewere (18). 
Ruvbl2 was also previously described (27). 

Immunoprecipitation and Chromatin-Immunopre- 
cipitation (ChIP) analyses were carried out using anti- 
bodies against: YY1 (Santa Cruz, Cat. sc-281), Suzl2 
(Cell Signaling, Cat. 3737), cMyc (N-262) (Santa Cruz, 
Cat. sc-764) and E2fl (C-20) (Santa Cruz, Cat. sc-193). 
Rabbit IgG (Sigma, Cat. 15006) were used as negative 
control. 
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Tandem affinity purification and Mass 
spectrometry analysis 

All protein purifications were carried out on ES cell nuclei 
prepared by 20min swelling in nuclear prep buffer (lOmM 
Tris, 100 mM NaCl, 2mM MgCl 2 , 0.3 M Sucrose, 0.25 % 
v/v tgepal) at 4°C. Nuclei were lysed in high salt buffer 
(50 mM Tris-HCl pH 7.5, 300 mM NaCl, 10% glycerol, 
0.25% Igepal) with fresh addition of a protease inhibitor 
cocktail (Roche). Direct streptavidin purifications were 
carried out by over-night (ON) incubation of 25 ul of 
streptavidin magnetic beads (Invitrogen, Cat. 656-01) for 
each milligram of protein extract. The tandem affinity 
purifications were performed by incubating ~20mg of 
nuclear protein extract with 200 ul of packed anti-Flag 
agarose beads (Sigma, Cat. A2220) ON at 4°C on a 
rotating platform. Beads were washed six times in 
minimum 10 beads volumes of high salt buffer at 4°C 
and protein complexes eluted for 30min with 0.5mg/ml 
of flag peptide (DYKDDDDK) in high salt buffer at 20° C 
four times. Eluates were pulled and further precipitated 
with 100 (J.1 of streptavidin magnetic beads (Invitrogen 
cat. 656-01) ON at 4°C. Streptavidin beads were washed 
six times as before at 4°C and protein complexes eluted 
with Laemli sample buffer (Invitrogen). 

Gel electrophoresis and in-gel digestion 

Proteins from both FBio-YYl and FBio empty vector 
control purification were separated by ID SDS-PAGE, 
using 4-12% NuPAGE® Novex Bis-Tris gels 
(Invitrogen) and NuPAGE® MES SDS running buffer 
(Invitrogen) according to manufacturer's instructions. 
The gel was stained with coomassie Blue using Colloidal 
Blue Staining Kit (Invitrogen). Samples were digested with 
trypsin (Promega). Briefly, the gel bands were cut and then 
washed four times with 50 mM ammonium bicarbonate, 
50% ethanol and incubated with lOmM DTT in 50 mM 
ammonium bicarbonate for 1 h at 56° C for protein reduc- 
tion. Alkylation step was performed incubating the sample 
with 55mM iodoacetamide in 50 mM ammonium bicar- 
bonate for 1 h at 25°C in the dark. Gel pieces were washed 
two times with a 50 mM ammonium bicarbonate, 50% 
acetonitrile solution, dehydrated with 100% ethanol and 
dried in a vacuum concentrator. Digestion was performed 
using 12.5ng/ml trypsin in 50 mM ammonium bicarbon- 
ate and incubated for 16 h at 37°C for protein digestion. 
Supernatant was transferred to fresh tube, and the remain- 
ing peptides were extracted by incubating gel pieces two 
times with 30% acetonitrile (MeCN) in 3% trifluoroacetic 
acid (TFA), followed by dehydration with 100% acetoni- 
trile. The extracts were combined, reduced in volume in a 
vacuum concentrator, desalted and concentrated using 
RP-C18 StageTip columns and the eluted peptides used 
for mass spectrometric analysis (28). 

Mass spectrometry analysis 

Peptide mixtures were separated by nano-LC/MSMS 
using an Agilent 1100 Series nanoflow LC system (Agilent 
Technologies), interfaced to a 7-Tesla LTQ-FT-Ultra 
mass spectrometer (ThermoFisher Scientific, Bremen, 



Germany). The nanoliter flow LC was operated in one 
column set-up with a 1 5-cm analytical column (75 urn 
inner diameter, 350 um outer diameter) packed with C18 
resin (ReproSil, Pur C18AQ 3 um, Dr Maisch, Germany). 
Solvent A was 0.1% FA and 5% ACN in ddH 2 0 and 
Solvent B was 95% ACN with 0.1% FA. Samples were 
injected in an aqueous 0.1% TFA solution at a flow rate 
of 500 nl/min. Peptides were separated with a gradient of 
0^10% Solvent B over 90min followed by a gradient of 
40-60% for lOmin and 60-80% over 5min at a flow rate 
of 250 nl/min. The mass spectrometer was operated in a 
data-dependent mode to automatically switch between 
mass spectrometry (MS) and MS/MS acquisition. In the 
LTQ-FT full scan MS spectra were acquired in a range of 
m/z 300-1700 by FTICR with resolution r = 100 000 at m/z 
400 with a target value of 1 000 000. The five most intense 
ions were isolated for fragmentation in the linear ion trap 
using collision-induced dissociation at a target value of 
5000. Singly charged precursor ions were excluded. In the 
MS/MS method, a dynamic exclusion of 60 s was applied 
and the total cycle time was ~2 s. The nanoelectrospray ion 
source (Proxeon, Odense, Denmark) was used with a spray 
voltage of 2.4 kV. No sheath and auxiliary gases were used 
and capillary temperature was set to 180°C. Collision gas 
pressure was 1.3 millitorrs and normalized collision energy 
using wide band activation mode was 35%. Ion selection 
threshold was 250 counts with an activation q = 0.25. The 
activation time of 30 ms was applied in MS2 acquisitions. 

Data analysis and assigning sequences using MASCOT 

The raw data from LTQ-FT Ultra were converted to mgf 
files using Raw2MSM software (29). The MS/MS peak 
lists were filtered to contain at most six peaks per 100 
Dalton intervals and searched by Daemon (version 2.2.2, 
Matrix Science) against a concatenated forward and 
reversed version of IPI mouse database (version 6.63) 
(56073 sequences; 25 214 299 residues) (30). This 
database was complimented with frequently observed con- 
taminants (porcine trypsin and human keratins) and their 
reversed sequences as well. Search parameters were: an 
initial MS tolerance of 7ppm, a MS/MS mass tolerance 
at 0.5 Da and full trypsin cleavage specificity, allowing for 
up to two missed cleavages. Carbamidomethylation of 
cysteine was set as a fixed modification and variable modi- 
fications included oxidation on methionine and acetyl- 
ation on N-terminus of proteins. We accepted peptides 
and proteins with a false discovery rate (FDR) of <1%, 
estimated based on the number of accepted reverse 
hits (31). 

ChIP, bioChIP and high-throughput sequencing 

ChIP assays were carried out as described previously (32). 
Briefly, 1% formaldehyde cross-linked chromatin was 
fragmented by sonication to an average size of 200- 
350 bp and immunoprecipitated ON with 10 ug of 
indicated antibodies. For bioChIP, 25 ug of streptavidin 
beads were added instead of the antibodies, following the 
protocol described in (33). DNA samples were sequenced 
on an Illumina Genome Analyzer II. About 36-bp short 
reads were then mapped onto the mm9 release of the 
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mouse genome using Bowtie (34). The alignments were 
performed allowing zero to two mismatches and keeping 
only the reads that align to unique positions in the 
genome. The YY1 sample was compared with the 
control DNA using Model-based Analysis for Chip-Seq 
[MACS, (35)]. Wiggle tracks for the visualization on the 
UCSC genome browser (36) were generated using MACS. 
Gene Interval Notator [GIN, (37)] was then used to 
annotate peaks over RefSeq mouse genes. A peak was 
assigned to the transcriptional start site (TSS) of a 
RefSeq gene when falling into the surrounding 4kb 
(±2kb). Datasets are available for download from 
NCBFs Gene Expression Omnibus (GEO, http://www 
.ncbi.nlm.nih.gov/geo) under accession number 
GSE31786. Row data from previously published 
ChlPseq datasets were aligned to the mm9 release follow- 
ing the same criteria. Raw ChlPseq data and relative 
negative controls were obtained from the following GEO 
accession numbers: Jarid2 GSE19365; Ezh2 and Ringlb 
GSE13084; H3K27me3 and H3K4me3 GSE12241; 
H3K27AC GSE24164; cMyc,nMyc, Zfx, E2fl, A Oct4 
and Sox2 GSE11431; B Oct4 and Nanog GSE11724. 

De novo motif discovery 

Multiple EM for Motif Elicitation [MEME, (38), version 
4.4.0] was used in order to search for highly occurring 
pattern in the DNA sequence underlying the putative 
binding sites. The analysis was narrowed to the 50 bp 
(±25 bp) around the peak summit. All the identified 
putative binding sites were included. The analysis was 
run looking on both strands for motifs with zero or one 
occurrence per sequence (zoops), ranging from 6 to 16 bp 
in length. 

Motif analysis 

Position-specific weight matrices (PWMs) were collected 
from the literature (39^13), and used to build a custom set 
of 597 models. The YY1 putative binding site identified 
through de novo motif discovery was added to this set. 
For some analyses, PWMs were clustered using 
BLiC (44). In this way we could reduce the complexity 
of our results using a non-redundant set of 229 PWMs. 

In order to identify over-represented PWMs in the YY1 
putative binding sites proximal (±2.5kb from a RefSeq 
TSS) regions were analyzed using Clover (45). The DNA 
sequences underlying the YY1 peaks were scanned for all 
the PWMs in the redundant set. Over-representation 
was statistically evaluated using three independent 
background sets, namely the entire chromosome 19, all 
the RefSeq TSSs (±2.5 kb) and all the CpG islands 
annotated in the mm9 genome. A PWM was retained 
only when significantly over-represented (P<0.01) 
compared with all of these backgrounds. Clover is avail- 
able as a standalone tool while results were parsed using a 
custom Python script. 

Regions bound by both c-Myc and n-Myc (now on 
referred as Myc) were intersected with the YY1 
proximal peaks. In this way we defined three sets, 
namely the YY1 -bound Myc-unbound, the YYl-bound 
Myc-bound and the YYl-unbound Myc-bound. For 



each different class of genes, we used Pscan (46) to 
detect statistically significant over-represented PWMs 
against a background dataset consisting of the three sets 
pulled together. In this case, the non-redundant set was 
used. In case a PWM showed P < 0.01 (two-tailed Welch's 
/-test) it was considered as significantly over-represented. 
The Pscan source code was modified in order to replace 
the statistical evaluation step based on the z-test with a 
step based on the /-test. The /-test is more suitable than the 
z-test when comparing datasets with similar cardinality 

(46) . In order to get a graphical representation of the 
results, PWMs that were found significant in at least one 
class were retained. Values were loglO-transformed and 
hierarchically clustered using average linkage and 
Pearson correlation as distance measure. A heat map 
was then drawn using this information. Pscan is available 
as a standalone application, whereas the clustering and the 
heatmap were performed using R. 

Density profile clusters were generated using 
SeqMINER K-means ranked clustering REF within a 
4-kb region centered on peaks' summit. Density values 
were generated using a 50-bp window. 

Microarray and micro RNA analysis 

RNA from two independent RNAi experiments was 
hybridized independently to Mouse Gene 1.0 ST 
Affymetrix Arrays. Signals were RMA normalized and 
probeset with a 1.3-fold expression difference and a 95% 
confidence determined by ANOVA were selected for the 
analyses. Datasets are available for download from 
NCBFs Gene Expression Omnibus (GEO, http://www 
.ncbi.nlm.nih.gov/geo) under accession number 
GSE31786. 

The micro RNA (miRNA) expression was determined 
using the TaqMan® Rodent MicroRNA A+B Cards Set 
v2.0 following manufacturer procedures. The miRNA 
with a 2-fold expression difference and a 90% /-test con- 
fidence were selected for the analyses. 

Density profiles clusters 

Density profiles clusters were generated using SeqMINER 

(47) K-means ranked clustering REF within a 4-kb region 
centered on peaks' summit. Density values were generated 
using a 50-bp window. 

Real Time quantitative PCR 

RT-qPCRs were carried out using Fast Sybergreen as pre- 
viously described (48). Primers used for PCRs are listed in 
Supplementary Table S7. 

Functional annotations 

mRNA and miRNA functional annotation were 
generated using Ingenuity Systems Pathway Analysis 
(IPA; www.ingenuity.com). The miRNA target genes an- 
notation was generated toward the validated miRNA 
database of IPA. 
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RESULTS 

In order to test if YY1 shares regulatory functions with 
PcG proteins, we performed ChIP analysis in E14 ES cells 
using antibodies specific for mouse YY1 and a subunit of 
the Polycomb Repressive Complex 2 (PRC2), Suzl2. As 
shown in Figure 1A, while Suzl2 was strongly enriched at 
known PcG binding sites, the YY1 antibody did not show 
any significant enrichment. Such result either indicate that 
YY1 does not associate with these genomic sites or that 
the YY1 antibody is not efficient in ChIP assays. In order 
to bypass these technical issues, we decided to develop a 
purification strategy that takes advantage of the in vivo 
biotinylation of proteins in mouse ES cells (49). Such tech- 
nique involves the constitutive expression of proteins of 
interest bearing a tag (Flag-Avi; FBio) recognized by the 
biotinylating enzyme BirA (Supplementary Figure SI A) 
allowing in vivo biotinylation of the tagged protein (33). 
Such a system has been previously used for native protein 



complexes purification, as well as for ChIP assays (49-51). 
For this, we took advantage of ES cells that carry the 
coding sequence of the BirA enzyme knocked-in the 
Rosa26 locus that drives BirA constitutive expression 
(BirA-ES) (26). As shown in Supplementary Figure SIB, 
these cells constitutively express physiological levels of a 
hemagglutinin-tagged version of the BirA enzyme and 
normal levels of the pluripotency marker Oct4. Thus, we 
generated stable cell lines expressing independently a 
biotinylated (bio) form of YY1 and Ezh2 (bioYYl and 
bioEzh2), which is the catalytic subunit of the PRC2 
complex (Supplementary Figure SID and E). With these 
cells, we performed a streptavidin co-precipitation experi- 
ment using optimal extraction conditions (Supplementary 
Figure SIC) and demonstrated that neither endogenous 
YY1 nor bioYYl co-precipitated components of the 
PRC2 complex, whereas the bioEZH2 protein efficiently 
co-precipitated endogenous core PRC2 subunits (Eed and 
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Figure 1. Genome-wide localization of YY1 does not correlate with PcG proteins (A) ChIP analysis using qPCR on the indicated genomic loci using 
the specified antibodies. (B) Western blot analyses using the indicated antibodies of streptavidin pull-down assays from protein extracts of ES cell 
lines independently expressing bioEzh2 and bioYYl. A control ES cell line with BirA expression alone is presented as purification control. Dotted 
line denotes removal of non-relevant lanes from the original blot. (C) BioChIP analysis by qPCR on chromatin prepared from the same cell lines 
presented in (B). (D) Genomic snapshots of the bioChlPseq results for YY1 and control (FBio-Crtl) BirA expressing ES cells. (E) Overlap of binding 
sites between bioYYl and the indicated ChlPseq datasets. (F and G) Overlap between target genes of the indicated ChlPseq datasets. Target genes 
are defined by the presence of at least one peak within ± 2 kb from RefSeq genes annotated TSS. P-values of the indicated overlaps are determined 
by hypergeometric distribution. 
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Suzl2), (Figure IB). Similar results were obtained using 
antibodies against endogenous proteins, further demon- 
strating the lack of interaction between YY1 and compo- 
nents of the PRC2 complex in ES cells (Supplementary 
Figure S1F). Using these cell lines, we also performed 
streptavidin ChIP (bioChIP) analyses at known PRC2 
binding sites. Consistent with Figure 1A data, we found 
that, although bioEzh2 was efficiently enriched at PcG 
binding sites, bioYYl was not (Figure 1C). Overall, 
these data confirm that YY1 does not interact with the 
PRC2 complex in ES cells and does not associate with the 
tested PcG binding sites. 

In order to generate a genome-wide map of the bio Y Y 1 
binding profile to DNA, we performed high-throughput 
sequencing of the DNA enriched in bioYYl ChIP 
(ChlPseq) (Figure IE). Enrichment analysis of the 
bioYYl ChlPseq, relative to a control BirA-ES cell line, 
revealed strong enrichment sites of bioYYl in proximity 
of TSS (Figure ID). Indeed, the overlap of all bioYYl 
peaks relative to promoter regions, defined as ±2kb 
from TSS, showed that ~80% of bioYYl peaks were 
found at genes promoters (Figure IE). Consistent with 
this, bioYYl binding showed a similar degree of overlap 
with CpG islands, a typical feature of promoter regions 
(Figure IE). Conversely, when overlapped with previously 
generated ChlPseq datasets for PcG proteins in ES cells, 
bioYYl binding sites did not overlap (<5%) with compo- 
nents of the PRC2 (Ezh2, Jarid2) and PRC1 (Ringlb) 
complexes, as well as with regions of accumulation of re- 
pressive tri-methylated (me3) histone H3 (H3) lysine (K) 
27, which is the product of PRC2 enzymatic activity 
(Figure IE) (25). This result is not due to a bias of chro- 
matin accessibility in bio-YYl BirA-ES cells since both 
H3K27me3 and unrelated genomic loci such as ES cells 
specific enhancers or intra-genic and inter-genic regions 
are efficiently immuno-precipitated with H3K27me and 
Histone H3-specific antibodies (Supplementary Figure 
S2A). These data are consistent with the observations pre- 
sented in Figure 1A-C. Moreover, bioYYl peaks strongly 
overlapped (>80%) with H3K4me3 and H3K27 
acetylated (ac) regions (Figure IE): while H3K4me3 was 
shown to form 'bivalent domain' of poised chromatin with 
H3K27me3 (52), H3K27ac was demonstrated to be 
mutually exclusive with H3K27me3 in ES cells (53). 
Moreover, while Ringlb and Ezh2 shared nearly all 
their entire set of targets, they did not significantly 
overlap bioYYl -bound promoters (Figure IF). Consistent 
with this, bioYYl target genes were not enriched for 
H3K27me3 but strongly enriched for H3K4me3, 
demonstrating that bioYYl targets are not bivalent 
genes and further suggesting an association with actively 
transcribed promoters (Figure 1G). A more detailed 
analysis of the distribution profile of bioYYl binding 
sites confirmed that bioYYl is associated preferentially 
with promoter regions, to a lesser extent to intra-genic 
regions and, only for a remaining ~16% of binding 
sites, to inter-genic regions (Figure 2A). Interestingly, 
most of these intra- and inter-genic bioYYl binding sites 
did not overlap significantly with CpG islands or with 
recently identified ES enhancer regions (<10%, data not 
shown) (54). Furthermore, the analysis of the density 



profiles of the distance between the summit of peaks and 
genes TSS showed that bioYYl was strongly enriched in 
close proximity of transcriptional initiation (Figure 2B), in 
agreement with the examples presented in Figure ID. 

Comparison of ES cell microarray expression analyses 
with bioYYl target genes demonstrated that bioYYl is 
directly associated with the promoter of genes with high 
level of transcriptional activity (Figure 2C). Indeed, func- 
tional annotation of bioYYl target genes revealed a 
strong enrichment in highly expressed genes, like genes 
encoding for proteins involved in RNA biogenesis, 
protein synthesis and mitochondrial functions involved 
particularly in embryonic development (Supplementary 
Figure S3A-C). Such result is consistent with the 
presence of non-bivalent H3K4me3 and with the high 
level of acetylation found on H3K27 at bioYYl binding 
sites, suggesting a global activatory role of YY1 in 
regulating gene transcription. Finally, sequence analysis 
of bioYYl -bound genomic regions identified a motif 
that perfectly matches a known YY1 DNA binding site, 
strongly suggesting that bioYYl genome-wide association 
to chromatin is directly mediated by its DNA binding 
activity (Figure2D). 

In order to gain further insight into the functional 
properties of YY1, we used bioYYl BirA-ES cells to 
identify YY1 specific interacting proteins. For this, we 
performed a tandem purification using the Flag and 
biotin tag and identified by mass spectrometry (MS) 
bioYYl-associated proteins in ES cells (Figure 2E). Such 
analysis revealed that bioYYl is stably associated with 
several components of the INO80 chromatin-remodeling 
complex in ES cells, as previously reported for cancer cells 
(12,13) (Figure 2E). In addition, we identified novel inter- 
acting partners of bioYYl, Ddx5 and Ddx3x, two proteins 
carrying RNA helicase activity (Figure 2E). Consistent 
with Figure 1 data, no peptides of PcG proteins were 
found in the MS analysis. These interactions were 
further validated in an independent experiment probing 
the product of a streptavidin purification with specific 
antibodies against different proteins identified in the MS 
analysis (Figure 2F). These findings suggest that 
co-recruitment of the INO80 remodeling complex and 
the RNA helicase activities could contribute to promote 
active transcription from promoters bound by YY1. 

In order to validate endogenous YY1 binding at target 
sites and the presence of its interacting partners, we per- 
formed ChIP analyses in ES cells using antibodies specific 
for YY1 and Ruvbl2, a stable component of the INO80 
complex. As shown in Figure 2G, both YY1 and Ruvbl2 
antibodies produced a significant enrichment over the 
background signal at several gene promoters, validating 
YY1 binding at the sites identified by the bioChlPseq 
analysis and demonstrating the co-recruitment of the 
INO80 complex at the same genomic regions. 

To gain further insights into the role of YY1 in tran- 
scriptional regulation, we developed an efficient siRNA 
mediated down-regulation of YY1 expression in mouse 
ES cells (Figure 2H). Using total RNA extracted from 
two independent RNAi experiments (Figure 2H), 
we determined, by means of Affymetrix microarrays, 
global gene expression changes upon acute YY1 



Nucleic Acids Research, 2012, Vol. 40, No. 8 3409 



B 



Promoters 
1747 





p-val<10~ 12 



-0.4 -0.2 0 0.2 0.4 



X 
CD 
CM 

CD ■- 



-10 -5 0 5 

Distance from TSS (kb) 

N = 2571 Bandwidth = 10 




DNA binding motif prediction 



YY1 
M0 1035 | 
transfac 
matrix 



MEME | 
prediction 



yj. c < 



Not Targets bioYyl Targets 

N=32577 N=2979 



p-val < 10 



E Ctrl FBio- Yy1 



FBio-Yy1 



Protein 


PAI 


Ruvbl2 


3.46 


Nfrkb 


2.5 


Yy1 


2.09 


RuvbH 


1.93 


ActrB 


1.15 


Ino80b 


0.86 


Ino80 


0.76 


Actr5 


0.67 


Uchl5 


0.65 


Ino80c 


058 


InoSOd 


0.29 


InoSOe 


0.16 


Ddx5 


0.32 


Ddx3x 


0.24 



input 



streptavidin 
purification 



FBio Ctrl Yy1 Ctrl Yy1 
FBio-Yy1 
Yy1 - 

Actr8 



Ruvbl2 
Ddx5 
Ddx3x 



H 

siSCR 
siYyl 

Yy1 
Vinculin 



+ - 
- + 



250 
200 



I Yy1 ChIP 







I 


"2 


r 






m 




n n _ n n 



Differentially expressed genes 





□ Up 

□ Down 



50 ~ 
40 CD 



p-value < 2.2e-16 



Figure 2. YY1 complex directly regulates active gene expression. (A) Distribution of bioYYl binding sites relative to the gene bodies of RefSeq 
annotated transcripts. (B) Density profile of bioYYl binding sites relative to TSS. All binding sites within ±10kb are included in the analysis. TSS 
distance is measured as the relative base pair distance to peaks' summits. A close up image of a ± 500 bp TSS density profile is also presented with an 
identical band with (lObp). (C) Annotation of gene expression levels between bioYYl target and non-target genes, /'-values are determined by 
Wilcoxon test. (D) MEME motif prediction of DNA sequences enriched in bioYYl ChlPseq. YY1 Transfac matrix is presented for comparison. 
(E) Silver staining of the isolated proteins with a Flag-Streptavidin tandem purification using protein extracts of FBio-YYl expressing ES cells (left). 
A purification using BirA expressing ES cells is presented as negative control. A summary table of the mass spectrometry results is presented on the 
right. Protein Abundance Index (PAI) is indicated as measure of purification efficiency. (F) Western blot analyses of streptavidin-purified proteins 
from nuclear extracts of the indicated ES cell lines using the specified antibodies. Input lanes correspond to 2% of extract used in IPs (G) ChIP 
analysis of E14 ES cells using the indicated antibodies on the specified genes TSS. (H) Western blot analysis of ES cell extracts independently 
transfected with YYl-specific or scrambled (SCR) control siRNA oligos. Vinculin is presented as loading control. (I) Distribution of bioYYl binding 
at the promoters of differentially expressed genes in YY1 siRNA-treated ES cells. P-values are determined with a chi-square test. Stacked columns 
show the relative distribution of up-regulated (Up) or down-regulated (Down) bioYYl target genes. 



down-regulation. Such analyses identified 292 genes that 
were differentially expressed, with a 95% confidence, 
between YY1 and SCR control siRNA- treated ES cells 
(Supplementary Table S2). Importantly, ~30% of the 
regulated genes present YY1 binding at their TSS. Such 
number is significantly higher than expected (chi-squared 
P < 2~ 16 ) and strongly suggests a direct activity of YY1 in 
controlling the expression of its target genes (Figure 21). 
Consistent with this, qRT-PCR analysis in cells treated 
with YYl-specific siRNA or shRNA targeting sequences 
validated the microarray results (Supplementary Figure 
S2B). Expression of YY1 target genes was preferentially 



diminished upon YY1 depletion, in agreement with an 
activatory role for YY1 (Figure 21). Nevertheless, 
several transcripts were also up-regulated, suggesting 
potential opposing functions for YY1 in transcrip- 
tional control (Figure 21). Interestingly, the most 
up-regulated transcripts in YY1 depleted ES cells are 
nuclear and nucleolar small non-coding RNAs 
(sncRNA) (Supplementary Table S3). A more detailed 
analysis of the whole microarray data identified ~22 
RNA transcripts that were differentially regulated in ab- 
sence of YY1 (Figure 3A). Nearly all these RNAs (86%) 
were up-regulated upon YY1 depletion (Figure 3 A). 



3410 Nucleic Acids Research, 2012, Vol. 40, No. 8 



Differentially expressed genes 

□ cRNA Up □ cRNA □ sncRNA Up 

□ cRNA Down rj sncRNA ■ sncRNA Down 



B 



Fold Change 
icRNA Transcript 





NA- 


9930014A18Rik 


5730422E09Rik 


Strong 


Intragenic 


' 


NA 


Snora73a 


Snhg3 


Low/None 


Intragenic 


2.76 


1.05 


Snord35b 


Rpsll 


Strong 


Intragenic 


3.11 


-1.0B 


Vau\trc5 


Zmat2 


Strong 


3p to TES 




1.06 


Snord32a 


Rpl13a 


Strong 


Intragenic 




-1.06 


Snora 7a 


Rpl32 


Medium 


Intragenic 








Snhg3 


Low/None 








Snora62 




Medium 


Intrrtgeric 






Snora31 


rp/f 


Medium 




1.88 




Rnu3b1 


Tex 14 


Low/None 






NA 


Rnu2 


NA 


Low/None 










4930459l23llik 








NA 


AUDI 5836 




Low/None 








Snord116 


Ipw 


Low/None 








Snora21 


Rpl23 


Medium 








Rnu12 


Poldip3 


Medium 


3p to TSS AS 






Snora34 


2310037l24Rik 


Strong 


Intragenic 






Rnu3a 


Gtf3c6 


Strong 


3p to TSS AS 






Rny3 


NA 


Medium 


3p to Rny3 






Mir297b 


Sfmbt2 


Low/None 








Mir300 


Gm2922 


Low/ None 


lnirri-g^r'i.j 


-1,32 




Gm8615 


5730422E09Rik 


Strong 







p-val = 0.00012 



Log 2 told difference 




FBio-Ctrl 
FBio-Yy1 



D E F 




Figure 3. YY1 negatively regulates sncRNAs intra-cellular levels. (A) Distribution of the differentially expressed coding RNAs (cRNA) and small 
non-coding RNAs (sncRNA) present on the Mouse Gene 1.0 ST Affymetrix Array (center column). Differential distributions of up-regulated (Up) or 
down-regulated (Down) RNA classes are presented in the left and right columns, /'-value is determined by chi-square test. (B) Heat map of the fold 
change expression values of the indicated sncRNAs and their annotated transcripts in YY1 RNAi-treated ES cells. Linear fold changes are also 
indicated within the heat map boxes. Left boxes indicate the presence of bioYYl binding and its relative position with respect to transcripts TSS. The 
bioYYl target sncRNAs are highlighted in green. YY1 binding intensity is defined using peaks P-value: low >10-40; mediun <10-40 and >10-70; 
High < 10-70. (C) Genomic snapshots of the bioChlPseq results for YY1 and control (FBio-Crtl) at the indicated genomic loci. (D) Distribution of 
regulated miRNAs upon YY1 down-regulation. Stacked columns show the relative distribution of up-regulated (Up) or down-regulated (Down) 
miRNAs. (E) Summary of the number of validated target mRNAs of the indicated miRNAs identified in (D). (F) Functional annotation of miRNA 
targets shown in Figure 3E using miRNAs Ingenuity Systems Pathway Analysis. Top scoring pathways are highlighted in red. 



Such result was significantly different from that expected 
(chi-squared P< 0.001), strongly suggesting that YY1 
could play a negative role in these sncRNA biogenesis. 

Since most of these sncRNAs are localized within 
intra-genic regions and are generated by either independ- 
ent transcription units or through splicing of longer tran- 
scripts (55), we believed that our direct bioYYl target 
annotation might have missed out most of this RNA 
species. Thus, we manually annotated transcript position 
of the 22 sncRNAs found in our expression analysis and 
established that most of the regulated sncRNAs mapped 
within longer transcripts or in close proximity of a TSS 
(Figure 3B). Importantly, more than 60% of sncRNAs 
presented bioYYl binding at the promoter of their 
associated transcripts (Figure 3C). Intriguingly, although 
sncRNA expression increased upon YY1 depletion, whole 
transcript expression was unaffected, suggesting a 



transcriptional independent role of YY1 in regulating 
sncRNA levels. 

To gain further insights for this observation, we decided 
to look at the expression levels of another class of 
sncRNAs, mature miRNAs. Using Applied Biosystem 
TaqMan technology, we measured the expression of 752 
mature miRNA species present in the mouse genome in 
two independent experiments. Such analyses identified 78 
miRNAs that were differentially expressed upon YY1 
knock down of which more than 70% resulted 
up-regulated upon YY1 RNAi, further validating our 
previous observations (Figure 3D and Supplementary 
Table S4). Functional analysis identified several mRNA 
transcripts that had been previously validated to be 
targeted by these miRNAs (summary list shown in 
Figure 3E and whole list in Supplementary Table S5), 
highlighting a significant functional enrichment in cell 
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cycle and developmental processes, as well as potential 
implications with diseases such as cancer and genetic dis- 
orders (Figure 3F). Overall, these data show that YY1 has 
a direct positive action on genes expression but it acts 
negatively on the accumulation of sncRNAs in ES cells. 

These data exclude a functional interaction between 
YY1 and PcG proteins and put forward YY1 as a 
positive regulator of gene expression in ES cells. Thus, 
to gain further insights on YY1 transcriptional activities, 
we scanned the bioYYl-bound genomic regions looking 
for the enrichment of known protein-DNA binding 
motifs. Such analysis identified several DNA elements 
enriched in proximity of bioYYl binding sites; in particu- 
lar, we found an evident over-representation of ETS tran- 
scription factors (Figure 4A). In addition, one of the DNA 
motifs with the highest score in the analysis was the DNA 
binding site of Zfx (Figure 4A). Zfx is a transcription 
factor that plays an important role in ES cell self-renewal 
(56) characterized as part of the Myc module of transcrip- 
tion factors (23,57). To investigate a potential overlap 
between Zfx and YY1, we analyzed the distribution of 
bioYYl and Zfx ChlPseq signals across their binding 
sites and found that a large part of bioYYl -bound 
regions were also occupied by Zfx (~70%) (Figure 4B). 
The density profiles of both transcription factors ChlPseq 
reads displayed a very similar distribution, demonstrating 
that regions of association were in proximity with each 
other and validate the data of the DNA motif analysis 
(Figure 4A). Analysis of bioYYl and Zfx direct target 
genes further confirmed this result, showing a highly 
significant overlap between bioYYl and Zfx targets 
(Figure 4C). Examples of bioYYl and Zfx binding 
profiles at target sites are presented in Figure 4D. 

Interestingly, two subunits of the INO80 complex that 
co-purified with bioYYl, Ruvbll and Ruvbl2, were also 
shown to stably interact with a component of the 
Myc-Max complex, Dmapl (Figure2 E and F) (23). 
Moreover, previous reports proposed a connection 
between YY1 and Myc activities in defined conditions 
(58-60) that, together with the high degree of overlap 
between Zfx and bioYYl genomic association (Figure 
4A-C), could suggest a functional overlap between YY1 
and Myc functions. To test this, we analyzed the 
genome-wide binding profile of Myc transcription 
factors relative to bioYYl binding. The density profiles 
of bioYYl together with c-Myc and n-Myc ChlPseq 
signals showed a large degree of overlap. Similar to Zfx, 
both n-Myc and c-Myc bind in proximity of bioYYl and 
define different clusters of binding regions either 
co-occupied by Myc and YY1 or by the proteins alone 
(Figure 4E). Examples of the binding profiles of these 
regions are shown in Figure 4F. Similarly for Zfx, 
analysis of Myc and bioYYl co-occupancy at target 
genes revealed a large degree of overlap between 
bioYYl- and Myc-bound promoters (Figure 4G). 
Importantly, the largest group of overlapping genes is sim- 
ultaneously bound by bioYYl, c-Myc and n-Myc (Figure 
4H). YY1 and Myc co-occupancy was further validated at 
endogenous level in both wild-type and BirA-ES cells at 
several YY1 target genes (Supplementary Figure S4A-C). 
Consistent with the lack of interaction between Myc and 



YY1 observed in the MS analyses, loss of YY1 expression 
did not alter Myc binding from co-occupied promoters 
(Supplementary Figure S5A). 

The discovery of YY1 and Myc co-occupancy induced 
us to explore the properties of Myc-bound YY1 pro- 
moters. We carried out a DNA binding sites prediction 
of the promoter regions that are co-occupied by bioYYl 
and Myc proteins relative to promoters that present 
the binding of the two transcription factors alone. 
Such analysis identified three different clusters of signifi- 
cantly enriched DNA binding motifs. Among the DNA 
binding motifs enriched in the YYl-Myc cluster, together 
with previously identified Smad and Elk binding sites 
(Figure 4A), we found that sites for different E2f tran- 
scription factors were also over represented in this group 
(Figure 5A). To extend these findings, we carried out an 
additional analysis that scanned for DNA binding motifs 
preferentially associated with promoters co-occupied by 
Myc and bioYYl relative to the ones excluded from 
Myc binding. Such analysis identified, together with the 
Myc DNA binding site (E-BOX, green box), binding sites 
for Zfpl61, Gmebl and different E2f proteins (sky blue 
boxes) (Figure 5B). In contrast, non-Myc YY1 targets 
(that represent the smaller fraction of bioYYl target pro- 
moters) were strongly enriched of A-/T-rich DNA binding 
motifs of which binding sites for homeobox related tran- 
scription factors (yellow boxes) were extensively repre- 
sented (Figure 5B). The specificity of this result is 
further supported by the preferential association of 
bioYYl at CG rich promoters (Figure 2E) and suggests 
a potential Myc-independent cooperation between homeo- 
box factors and YY1 on a specific set of target genes. 

Since E2f factors were previously characterized to be 
part of the Myc transcription module in ES cells (23,57), 
since E2f activity was linked to YY1 (61) and since E2f 
binding sites were always enriched in our DNA motif dis- 
covery analyses, we extended our genome-wide analysis to 
E2fl. Analysis of E2fl ChlPseq data from ES cells 
revealed an extensive overlap with bioYYl and Myc 
binding sites (Figure 5C). Examples of binding profiles 
for these data are presented in Figure 5D and validations 
of endogenous YY1, c-Myc and E2fl co-occupancy at 
target sites are shown in Supplementary Figures S4A-C 
and S5A. All together, these findings strongly suggest an 
extensive cooperation between YY1 and transcription 
factors previously characterized within the Myc transcrip- 
tional module. An overall analysis of these data is 
summarized in Figure 6A and shows the existence of dif- 
ferent clusters of genomic loci characterized by different 
combinations of occupancy for these five transcription 
factors. Examples of the binding profiles at target genes 
by the different transcription factors are presented in 
Figure 6B. Importantly, the group co-occupied by all tran- 
scription factors represents the largest group of bioYYl 
targets (Figure 6C and Supplementary Table S6). Such 
result is significantly different from expected and 
supports a transcriptional cooperation between YY1 and 
Myc related transcription. To test this, we generated a 
correlation map of the ChlPseq binding profiles between 
bioYYl, the components of the core pluripotency module 
(Oct4, Nanog and Sox2) and of Myc module (c-Myc, 
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Figure 4. YY1 binding sites overlap with Zfx and Myc occupancy. (A) Result of Clover analysis on bioYYl target promoters. Only the motifs with a 
score >50 are presented in the table, /'-values of each motif relative to the indicated reference set are presented in addition to the score values. 
(B) K-means clustering of reads intensities in bioYYl and Zfx ChlPseq data on all bioYYl- and Zfx-associated genomic loci within a 4kb region 
centered on peaks' summits. (C) Overlap between the target genes of the indicated ChlPseq datasets. Target genes are defined by the presence of at 
least one ChlPseq peak within ±2kb from genes annotated TSS. /"-values of the indicated overlaps are determined by hypergeometric distribution. 
(D) Examples of the Genomic snapshots generated in bioYYl and Zfx ChlPseq. (E) As in (B) using bioYYl, c-Myc- and n-Myc-bound genomic loci. 
(F) Examples of the Genomic snapshots generated in bioYYl, c-Myc and n-Myc ChlPseq. (G) As in (C) with bioYYl, c-Myc and n-Myc target 
genes. (H) Distribution of YY1 promoters association relative to c-Myc and n-Myc co-occupancy at target genes. Chi-square test determines the 
P-value. 



n-Myc, Zfx and E2fl) (Figure 6D). Such analyses clearly 
demonstrate that bioYYl binding has a strong correlation 
with components of the Myc module but do not correlate 
in binding with core pluripotency factors, indicating that 
YY1 is part of the Myc-related transcription network. 



In order to correlate transcription factors co-occupancy 
at YY1 -bound promoters with genes transcriptional 
activity, we compare microarray expression data from 
ES cells with the different classes of bioYYl target 
genes. From this analysis, we concluded that cumulative 
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binding of the different transcription factors with YY1 
potentiates gene expression (Figure 6E). Interestingly, 
although Myc and E2fl binding alone have a positive 
transcriptional effect, Zfx does not seem to influence 
gene expression significantly. Nevertheless, cumulative 
binding of all three classes of transcription factors at 
bio Y Y 1 target genes correlate with maximal transcription- 
al activity (Figure 6E, boxplots). A comparative analysis 
of the same classes of target genes relative to bioYYl as- 
sociation reveals that the presence of YY1 potentiates 
gene expression when combined with Myc and or E2fl 
and induces maximum expression level when associated 
simultaneously with all the other transcription factors 
(Figure 6E, bottom heat map and Supplementary Figure 
S5B). Simultaneous loss of cMyc andnMyc activity 
induces loss of ES cells pluripotency and activation of 
differentiation programs (62). Consistent with this, 
stable shRNA mediated reduction of YY1 expression 
in ES cells led to an increased expression of differenti- 
ation markers and defects in the activation of some 
lineage-specific genes upon differentiation in embryoid 
bodies (EB) (Supplementary Figure S6A and S6B). 
Importantly, EBs differentiation induced a counter selec- 
tion for YY1 knock down efficiency (Supplementary 
Figure S6A) in agreement with a potential role for YY1 
in regulating proper ES cell differentiation that is consist- 
ent with the severe early implantation defects observed in 
YY1 KO embryos. Overall, these data demonstrate a func- 
tional role of YY1 in genome-wide Myc transcriptional 
functions proposing a positive effect on the expression 
on genes that belong to the Myc transcriptional network. 



DISCUSSION 

Determination of cellular fate is a complex and still poorly 
understood process that is controlled at a transcriptional 
level by the activity of tissue-specific transcription factors 
(21). ES cells are the cell type with the highest differenti- 
ation plasticity (pluripotency) that can be isolated in tissue 
culture (63). Moreover, several publications have 
proposed that cancer cells share specific transcriptional 
features with ES cells (ESC-like signatures) (64,65). This 
makes the characterization of the transcriptional mechan- 
isms that control ES self-renewal and differentiation im- 
portant not only to understand their identity and plasticity 
but also to characterize transcriptional features shared 
with cancer cells within a physiological context. In line 
with this, distinct transcriptional modules have been 
proposed to control ES transcription programs. This 
involves the repressive PcG module, the Myc module 
and the core pluripotency factors module (23). Not sur- 
prisingly, all components of these networks were shown to 



play important roles in controlling ES self-renewal and 
differentiation capabilities (66,67). Moreover, several of 
these factors are also strongly implicated in tumor devel- 
opment: the best examples are represented by the frequent 
activation of PcG and Myc functions (25,59). 

Here, we have presented a detailed characterization of 
YY1 functions in mouse ES cells. We show that YY1 
does not share physical and functional properties with 
PcG proteins, while it preferentially associates with the 
CG-rich promoters of actively transcribed genes that cor- 
respond to non-bivalent H3K4me3 enriched and H3K27 
hyper-acetylated promoters. This does not exclude that 
YY1 could play a role in PcG recruitment in particular 
situations but our data clearly show that this activity, if 
occurs, must be restricted to very defined circumstances. 
Consistent with this, like in cancer cells (12,13), YY1 
stably associates with components of the INO80 remodel- 
ing complex, as well as with newly identified RNA helicase 
activities. YY1 RNAi experiments further confirm these 
observations showing that loss of YY1 preferentially leads 
to a down-regulation of gene expression. Although, only a 
minority of direct bioYYl target genes was impaired in 
expression, this suggests that YY1 binding, together with 
INO80 and RNA helicase activities, facilitates gene ex- 
pression in agreement with their co-occupancy at different 
target promoters. This is in line with the YY1-INO80 
co-occupancy observed at Cdc6 promoter in cancer cells 
(12). Lack of significant transcriptional changes in the 
majority of bioYYl target genes could be due either 
to the partial knock down that we obtained with 
both YY1 siRNAs (Figure 2H) or shRNAs treatment 
(Supplementary Figures S2B, S5A and S6A) or by com- 
pensatory effects mediated by the other transcription 
factors. Moreover, YY1-INO80 interaction has been 
also proposed to play a role in DNA homologous recom- 
bination that occurs in case of DNA damage (13). Due to 
its ability to associate directly with the DNA cruciform 
structure of Holliday junctions, YY1 has been proposed to 
function as a bridge for the recruitment of INO80 re- 
modeling activities at damaged sites (13). Such DNA 
binding activity does not seem to require YY1 DNA 
binding motifs in odds with the tight association 
between the ChlPseq results and the YY1 DNA binding 
elements (Figure 3D). Nevertheless, the increased genomic 
instability observed in hypomorphic YY1 KO mouse em- 
bryonic fibroblasts supports the hypothesis that YY1 
could exert this function (13). However, our knock 
down studies do not show any evident genomic instability 
in YY1 depleted ES cells (data not shown), either because 
of the transient nature of the siRNA-based experiments 
we performed or because the YY1-DNA repair activities 
are suppressed by the lack of cell cycle checkpoints and the 



Figure 6. Continued 

indicated ChlPseq dataset. Two independent Oct4 datasets are included in the analysis. The datasets references are indicated in the 'Materials and 
Methods' section. (E) Box plots of the expression of bioYYl target genes relative to the co-occupancy of the indicated proteins. P- value was 
determined by Kruskal-Wallis test. Bottom intensity map highlights the significance of the contribution of bioYYl binding relative to non-YYl 
binding to target gene clusters defined by the co-occupancy of the indicated proteins. P-values are indicated within the boxes and are determined by 
Wilcoxon test. (F) Model of YY1, Myc, Zfx and E2fl co-occupancy at target promoters of transcribed genes. The model includes data from previous 
purifications of Myc complexes from ES cells (23) and speculates on potential transcriptional and post-transcriptional activities of YY1. 
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high rate of apoptosis present in mouse ES cells (68,69). 
Finally, the genome-wide combination of expression and 
localization analyses that we have performed allows 
strengthening a direct role of YY1 to regulate its target 
genes expression. Despite a significant portion of tran- 
scripts increases expression in absence of YY1 activity, 
the finding that the most up-regulated transcripts belong 
to classes of sncRNAs is very surprising and intriguing. 
It is not clear if YY1 binding plays a role in regulating 
differential transcriptional units of sncRNAs or if it plays 
a role in the biogenesis of these mature RNAs. The direct 
binding of YY1 to the promoters of most of these 
sncRNAs associated mRNAs might suggest that either 
YY1 binding suppresses the activation of alternative 
sncRNA transcription units or that YY1, together with 
the INO80 remodeling and the RNA helicases activities, 
can have a negative effect on sncRNA processing, hence 
altering their stability. It is also possible that sncRNAs 
up-regulation is an indirect effect of a stress response 
caused by YY1 down-regulation. However, the direct 
binding of YY1 in proximity of sncRNA genomic loci 
could support a direct role of YY1 in regulating their 
cellular levels. 

Both our finding that YY1 do not share functional 
properties with PcG activities and that instead share 
a global regulatory functions with the Myc-related 
transcriptional network are in perfect line with other 
observations. YY1 was shown to possess the ability, in 
non-physiological conditions, to interact with Myc 
(58,59). The lack of Myc protein in our YY1 purification, 
as well as in others performed in cancer cells (12,13) 
clearly shows that, if Myc interacts physiologically with 
YY1, such binding is not stable. Nevertheless, this could 
be important to stabilize the genome-wide DNA mediated 
co-occupancy of YY1 and Myc at their shared binding 
sites. Similar to this, E2fl was also shown to interact 
in vitro with YY1 (61) but, like Myc, was not found in 
the YY1 purifications. In addition, YY1 was also shown 
to directly activate expression from the Myc promoter and 
to increase, when overexpressed, Myc endogenous tran- 
scripts (70,71). In ES cells, we do not observe binding of 
YY1 at any Myc paralog promoters neither we detect sig- 
nificant expression changes in Myc expression 
(Supplementary Table S2) suggesting that this regulatory 
mechanism is not conserved in ES cells. 

Our data on the transcriptional cooperation between 
YY1 and the Myc module of transcription factors are in 
line with previous observations. For example, YY1 was 
shown to cooperate with Myc in activating Surf-1 expres- 
sion (60) while it was shown to act synergistically with 
E2F1 in activating the Cdc6 promoter (61). The cMyc 
andnMyc KO mice are both early embryonic lethal 
(72-74) (E10.5 and El 1.5, respectively) (72-74) and 
double Myc KO ES cells activate differentiation genes ex- 
pression and loose their pluripotency (62). Similarly, re- 
duction of YY1 expression induces the activation of 
differentiation markers and genes related to embryonic 
development. Moreover, differentiation of these cells 
induced a strong counter selection for YY1 knock down 
efficiency that is suggestive of a potential role for YY1 in 
differentiation consistent with its essential role early 



implantation development. Generation of inducible YY1 
KO ES cells will therefore be an invaluable tool to extend 
these observations. Importantly, in addition to the essen- 
tial function in regulating normal development and differ- 
entiation, YY1 and Myc are both activated in 
NDEA-induced hepatocarcinogenesis (75), as well as in 
Burkitt Lymphomas (76) and in prostate cancers (77). 
Interestingly, YY1 and Zfx were identified out of six 
proteins that scored in a proteomic study aimed to 
identify proteins expressed in neoplastic nodes of diffuse 
large B cell and Follicular lymphomas (78), two tumor 
types that are frequently driven by Myc amplifications 
and overexpression (24). Altogether, our data on the 
global transcriptional cooperation between YY1 and the 
Myc transcription network (summarized in Figure 6F) 
makes a big stem in putting together little pieces of obser- 
vations disperse over several years of literature and 
identify a novel component of a keystone transcriptional 
regulatory network of normal pluripotent and cancer cells. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1-7 and Supplementary Figures 
1-6. 
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